feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443
feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443congx4 wants to merge 8 commits into
Conversation
fef38b8 to
5bab546
Compare
e8a723e to
0cd3d6c
Compare
Two small additions to the ES-compat `_mapping(s)` endpoint that
together let downstream callers (e.g. Trino's ES connector) skip the
expensive `list_fields` scan in the common case.
Today `GET /_elastic/{index}/_mapping(s)` calls `list_fields` over every
published split. On indexes with hundreds of thousands of dynamic
fields this can take several seconds and runs into
`QW_FIELD_LIST_SIZE_LIMIT` (100k by default). This PR addresses both
pieces with no proto change:
1. Timestamp pushdown
New `?start_timestamp=…&end_timestamp=…` URL params on `_mapping(s)`,
forwarded into `ListFieldsRequest` verbatim. The metastore prunes
the candidate split set by time window before any leaf fan-out.
Unit is epoch seconds, half-open interval — matching the existing
`ListFieldsRequest` proto contract.
2. Column-hints fast path
New `?fields=…` URL param (comma-separated names). When every
requested name is a flat literal (no `*`, no `?`, no `.`) declared
in the union of the indexes' `doc_mapping`, the handler builds the
response straight from the declared mapping, filtered to those
names. No `list_fields` call, no split I/O.
Anything else (wildcards, dotted paths, names not in `doc_mapping`)
falls through to the full-mapping path: `list_fields` over the
splits in the time range, full unfiltered mapping returned — same
shape as today, just with the timestamp-pushdown optimization
applied.
Notes:
- Unknown query params are silently ignored (no `deny_unknown_fields`)
to stay compatible with standard ES clients that pass `pretty`,
`ignore_unavailable`, `allow_no_indices`, etc.
- No proto change. Stays on existing `ListFieldsRequest`.
- `IndexMappingQueryParams` parser and the new
`ElasticsearchMappingsResponse::from_doc_mapping_filtered` are
unit-tested in their respective modules.
- rename `fields` query param to `field_patterns` to mirror `ListFieldsRequest.field_patterns` - switch tests to `serde_qs` (already a workspace dep, matches the bulk-query-params test style) - move the declared-field filter out of `mappings.rs` and into the rest_handler fast path: trim `field_mappings` in place before calling `from_doc_mapping`, dropping `from_doc_mapping_filtered` entirely (dynamic fields were already filtered at the leaves via `ListFieldsRequest.field_patterns`) - nits in `parse_field_patterns`: trim/filter before allocating the owned String per token - nits in `collect_declared_top_level_names`: functional flat_map style
5bab546 to
2f719f6
Compare
- drop the fast-path declared-field `retain(...)` in rest_handler. `field_patterns` is now hint-only: it triggers the fast path (skip `list_fields`) when every pattern matches a flat declared field, and is pushed down to the leaves for dynamic-field filtering. Both fast and slow paths now return the full declared schema, matching slow- path semantics that existed before. - remove unused `serde_urlencoded` dev-dep from `quickwit-serve` and the workspace `Cargo.toml` (was already unused after switching tests to `serde_qs`). - `collect_declared_top_level_names`: switch back to a procedural form preallocated with `HashSet::with_capacity(sum)` to signal the upper bound — no filtering, no explosion.
The fast path (skip list_fields when all field_patterns are declared flat fields) will be implemented in pomsky. This commit reduces the handler to always call root_list_fields, forwarding field_patterns, start_timestamp, and end_timestamp as pushdown hints.
There was a problem hiding this comment.
💡 Codex Review
https://github.com/quickwit-oss/quickwit/blob/b208d212bd3d9863f6713812a0e8c8f1199d4276/quickwit-serve/src/elasticsearch_api/model/index_mapping_query_params.rs#L31-L32
Honor the advertised
fields parameter
The new mapping fast path is described as being enabled by ?fields=..., but this query struct only deserializes field_patterns; because unknown params are explicitly ignored here, requests using GET /_elastic/{index}/_mapping?fields=host will leave field_patterns as None, fan out to every dynamic field, and can still hit the list-fields size/latency problem this change is meant to avoid. Add a serde alias/rename for fields or update the API contract and callers accordingly.
https://github.com/quickwit-oss/quickwit/blob/b208d212bd3d9863f6713812a0e8c8f1199d4276/quickwit-serve/src/elasticsearch_api/rest_handler.rs#L241-L244
Surface invalid field patterns instead of dropping dynamics
Passing user-supplied patterns into ListFieldsRequest without validating them means inputs such as field_patterns=a** hit FieldPatterns::from_strs in the leaf and return an InvalidArgument, but this handler later converts any root_list_fields error to None via .ok(). In that scenario the endpoint returns 200 with only declared mappings and silently omits all dynamic fields, so validate the patterns before dispatching or propagate this specific error as a bad request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- move parse_field_patterns into IndexMappingQueryParams::field_patterns() - return Vec<String> directly instead of Option<Vec<String>>; caller no longer needs unwrap_or_default - use filter_map to trim and drop empties in one pass - split final Ok(...) into let response + Ok(response)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 010d7f26f6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Callers using the documented `?fields=` parameter (e.g. the Trino ES connector) would get an empty ListFieldsRequest.field_patterns, causing leaves to enumerate all dynamic fields. Adding the serde alias makes both `?fields=` and `?field_patterns=` map to the same field.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d9a120b4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| field_patterns: Vec::new(), | ||
| start_timestamp: None, | ||
| end_timestamp: None, | ||
| field_patterns: params.field_patterns(), |
There was a problem hiding this comment.
Propagate invalid field pattern errors
When fields/field_patterns contains a pattern that the list-fields layer rejects (for example an ES-style pattern with more than one *, which FieldPattern::from_str returns as InvalidArgument), this forwards the user input into root_list_fields but the handler still converts any error to None with .ok() below. In that scenario the request succeeds with only declared mappings and silently drops all matching dynamic fields, making the new filtered mapping endpoint return incomplete results instead of reporting the bad filter.
Useful? React with 👍 / 👎.
InvalidArgument from root_list_fields (e.g. a pattern with more than one wildcard) now propagates as an ElasticsearchError instead of being swallowed by .ok(). Other failures (infrastructure, timeout) still degrade gracefully to a declared-only mapping response.
28cf128 to
46e8ce9
Compare
What's in this PR
Today
GET /_elastic/{index}/_mapping(s)callslist_fieldsover every published split. On indexes with hundreds of thousands of dynamic fields this can take several seconds, and over a certain threshold the leaf hitsQW_FIELD_LIST_SIZE_LIMIT(100k by default) and the request fails.Timestamp pushdown — new
?start_timestamp=…&end_timestamp=…URL params on_mapping(s), forwarded intoListFieldsRequestverbatim. The metastore prunes the candidate split set by time window before any leaf fan-out. Unit is epoch seconds, half-open interval — matching the existingListFieldsRequestproto contract.field_patterns — new
?fields=…URL param (comma-separated names). Push fields patterns to leaf_list_fields to reduce the workload(sort, merge).Test plan
cargo build -p quickwit-serve— cleancargo clippy -p quickwit-serve --tests— cleancargo +nightly fmt --all -- --check— cleancargo nextest run -p quickwit-serve --lib elasticsearch_api::— 75 / 75 pass🤖 Generated with Claude Code